ETA & Voyage Monitoring

📝The Challenge

Inaccurate ETAs are a primary source of operational friction in the maritime industry, leading to increased costs, scheduling conflicts, and inefficient port operations. This module addresses this by moving beyond simple calculations to a data-driven prediction model that accounts for real-world, dynamic variables.

Data Flow & Processing

The ETA prediction model relies on a robust, four-stage data pipeline to ensure accuracy and reliability, transforming raw inputs into actionable insights.

Core Algorithms & Calculations

The ETA prediction model is built on a comprehensive data pipeline that transforms raw, diverse inputs into a refined, reliable forecast. This process involves five key stages, from initial data collection to the final continuous calculation.

1. Data Ingestion: Weather & Route Polling

The process begins by polling external APIs for essential voyage context. The system is designed to periodically query Navtor for detailed routing information and Stormglass for real-time and forecasted weather conditions along the vessel’s planned route. Algorithm Logic:

Establish secure connections to Navtor and Stormglass APIs.
Request route data for a specific vessel and voyage.
Request weather parameters (wind speed, wave height, currents) for the specific geographical points along the route.
Store the raw, unstructured JSON responses for processing.

2. Data Extraction: Email & Noon Report Parsing

A significant portion of operational data arrives in unstructured formats, such as noon report emails. The system uses a sophisticated parsing engine to extract critical information from this text. Algorithm Logic:

Monitor a designated inbox for incoming noon report emails.
Use regular expressions (regex) and keyword matching to identify and isolate key data points (e.g., “SOG:”, “Remaining Dist:”, “ETA:”).
Extract values for vessel speed, remaining distance, fuel consumption, and the reported ETA.
Temporarily store this extracted, key-value data for the next stage.

import re

def parse_noon_report_email(email_body):
    # Use regex to find key-value pairs in the email text
    # Note: These are simplified patterns for illustration
    sog_pattern = re.compile(r"SOG:\s*([\d\.]+)\s*knots")
    eta_pattern = re.compile(r"ETA:\s*(\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2})")
    
    sog_match = sog_pattern.search(email_body)
    eta_match = eta_pattern.search(email_body)
    
    extracted_data = {
        "SOG": sog_match.group(1) if sog_match else None,
        "ETA": eta_match.group(1) if eta_match else None,
        # ... other extracted fields
    }
    
    return extracted_data

Alternative Method: GPT-Based Parsing For more complex or less structured reports, a Large Language Model (LLM) is be used for more robust and flexible data extraction.

import openai

def parse_with_gpt(email_body):
    # Prepare a prompt that instructs the model to extract key information
    prompt = f"""
    Extract the following entities from this noon report:
    - Speed Over Ground (SOG) in knots
    - Estimated Time of Arrival (ETA) in YYYY-MM-DD HH:MM format
    
    Report: "{email_body}"
    
    Return the result as a JSON object.
    """
    
    # Call the OpenAI API (or any other LLM provider)
    response = openai.Completion.create(
      engine="text-davinci-003",
      prompt=prompt,
      max_tokens=100
    )
    
    # The model's response will be a JSON string that can be parsed
    extracted_data_json = response.choices[0].text
    return json.loads(extracted_data_json)

3. Data Standardization & Enrichment

Once data is extracted, it must be converted into a standardized, structured format. This stage involves cleaning the data, converting units, and enriching it with information from other sources. Code Implementation:

def standardize_report_data(extracted_data):
    # Convert speed from knots to a standard float
    standardized_speed = float(extracted_data.get("SOG"))
    
    # Standardize date and time formats to UTC
    reported_eta_str = extracted_data.get("ETA")
    standardized_eta = convert_to_utc(reported_eta_str)
    
    # Enrich with data from other sources
    vessel_dwt = get_vessel_particulars(vessel_id)
    
    # Create a clean, structured data object
    structured_report = {
        "speed_knots": standardized_speed,
        "eta_utc": standardized_eta,
        "vessel_dwt": vessel_dwt,
        # ... other fields
    }
    return structured_report

4. Data Validation: Time-Based Correction

A key source of ETA error stems from the timing of noon report submissions. Reports filed after midday can be incorrectly timestamped to the following day. The system applies a specific logical check to correct this. Algorithm Logic: The algorithm checks the timestamp of each incoming noon report. If the report’s time is after 12:00 PM (noon), but the associated date has been advanced to the next day, the algorithm corrects the date back to the actual day of submission. Code Implementation:

def correct_noon_report_date(report):
    # Check if the report time is post-meridian (after 12:00 PM)
    is_after_noon = report.time > '12:00:00'
    
    # Check if the date has been incorrectly advanced
    date_is_advanced = report.date > actual_submission_date
    
    if is_after_noon and date_is_advanced:
        # Revert the date to the correct day
        report.date = actual_submission_date
        
    return report

6. Voyage Status & Geolocation Processing

After the primary ETA is calculated, the data is further enriched with voyage status and geolocation information to provide a complete operational picture on the dashboard. Geolocation Mapping: This function maps the vessel’s current location to a standardized geographical region for easier tracking and filtering. Code Implementation:

def map_location(latitude, longitude):
    # This function would contain logic to map coordinates to defined regions
    # Example:
    if 30.0 < latitude < 60.0 and -30.0 < longitude < 0.0:
        return "North Atlantic"
    elif 25.0 < latitude < 45.0 and 35.0 < longitude < 65.0:
        return "Arabian Sea"
    else:
        return "Unknown Region"

Voyage Status Creation: The system generates a dynamic, human-readable status for each voyage based on its current operational data. Code Implementation:

def create_voyage_status(voyage_data):
    # This function would create a status string based on vessel activity
    # Example:
    if voyage_data.get('speed_knots') > 1:
        status = f"En route to {voyage_data.get('destination_port')}"
    else:
        status = f"Alongside at {voyage_data.get('current_port')}"
    
    return status

Final Data Processing: This function orchestrates the final data processing steps, bringing together all the calculated and enriched data points into a final, dashboard-ready object. Code Implementation:

def process_eta_data(voyage_id):
    # This function would be the main orchestrator for a single voyage
    # 1. Fetch the latest validated report data
    report_data = get_validated_report(voyage_id)
    
    # 2. Calculate the live ETA
    live_eta = calculate_live_eta(report_data)
    
    # 3. Map the location
    geo_location = map_location(report_data.get('lat'), report_data.get('lon'))
    
    # 4. Create a human-readable status
    voyage_status = create_voyage_status(report_data)
    
    # 5. Assemble the final data object for the dashboard
    dashboard_payload = {
        "voyage_id": voyage_id,
        "live_eta_utc": live_eta,
        "location_region": geo_location,
        "current_status": voyage_status,
        "last_updated": get_current_utc_time()
    }
    
    return dashboard_payload

App Insights

​Data Flow & Processing

​Core Algorithms & Calculations

​1. Data Ingestion: Weather & Route Polling

​2. Data Extraction: Email & Noon Report Parsing

​3. Data Standardization & Enrichment

​4. Data Validation: Time-Based Correction

​6. Voyage Status & Geolocation Processing